Metagenomic data of bacterial communities associated with Acropora species from Phu Quoc Islands, Vietnam

Abtract Acropora is one of the most common coral genera found in Phu Quoc Islands, Vietnam. However, the presence of marine snails, such as the coralllivorous gastropod Drupella rugosa, was a potential threat to the survival of many scleractinian species, leading to changes in the health status and bacterial diversity of coral reefs in Phu Quoc Islands. Here, we describe the composition of bacterial communities associated with two species of Acropora (Acropora formosa and Acropora millepora) using the Illumina sequencing technology. This dataset includes 5 coral samples of each status (grazed or healthy), which were collected in Phu Quoc Islands (9°55′20.6″N 104°01′16.4″E) in May 2020. A total of 19 phyla, 34 classes, 98 orders, 216 families and 364 bacterial genera were detected from 10 coral samples. Overall, Proteobacteria and Firmicutes were the two most common bacterial phyla in all samples. Significant differences in the relative abundances of the genera Fusibacter, Halarcobacter, Malaciobacter, and Thalassotalea between grazed and healthy status were observed. However, there was no differences in alpha diversity indices between the two status. Furthermore, the dataset analysis also indicated that Vibrio and Fusibacter were core genera in the grazed samples, whereas Pseudomonas was the core genus in the healthy samples.


a b s t r a c t
Acropora is one of the most common coral genera found in Phu Quoc Islands, Vietnam. However, the presence of marine snails, such as the coralllivorous gastropod Drupella rugosa, was a potential threat to the survival of many scleractinian species, leading to changes in the health status and bacterial diversity of coral reefs in Phu Quoc Islands. Here, we describe the composition of bacterial communities associated with two species of Acropora ( Acropora formosa and Acropora millepora ) using the Illumina sequencing technology. This dataset includes 5 coral samples of each status (grazed or healthy), which were collected in Phu Quoc Islands (9 °55 20.6 N 104 °01 16.4 E) in May 2020. A total of 19 phyla, 34 classes, 98 orders, 216 families and 364 bacterial genera were detected from 10 coral samples. Overall, Proteobacteria and Firmicutes were the two most common bacterial phyla in all samples. Significant differences in the relative abundances of the genera Fusibacter, Halarcobacter, Malaciobacter, and Thalassotalea between grazed and healthy status were observed. However, there was no differences in alpha diversity indices between the two status. Furthermore, the dataset analysis also indicated that Vibrio and Fusibacter were core genera in the grazed samples, whereas

Value of the Data
• This dataset provides the description of bacterial diversity and community composition associated with Acropora formosa and Acropora millepora corals from Phu Quoc Island, Vietnam. • The dataset is a valuable source for comparison of bacterial communities between various coral species, as well as the host's status. • Based on understanding of these bacterial communities, the data might be utilized for metabolic and functional prediction of microbial communities in corals, specially in the stress conditions caused by coral predators.

Objective
This dataset is an important part of the project, namely Ecogenomics of viruses in two coral reefs in Vietnam: Phu Quoc and Con Dao Islands. In this project, we collected some microorganims for investigation of the microbial diversity and ecology associated with coral reefs in Phu Quoc and Con Dao Islands. These data included viruses, bacteria, archaea, and microeukaryota, which were collected from healthy, bleached, and grazed coral species. In this article, we provide the bacterial dataset for comparison and evaluation of bacterial communities associated with two genera, Acropora formosa and Acropora millepora. This provides another aspect of bac-terial diversity, which is affected not only by abiotic factors but also by biotic factors such as coral predators.

Data Description
The composition of bacterial communities living in Acropora formosa and Acropora millepora was investigated based on 16S rRNA gene sequencing. After removing chimeric, singletons, mitochondrial and chloroplast sequences, a total of 218,703 reads, with a median of 27,356 and a mean of 21,870 sequences per sample, were obtained from 10 coral mucus samples.
The species richness varied considerably across the samples. However, the rarefaction curves approached the plateau, indicating that the sequencing depth was sufficient ( Fig. 1 ). The raw data of 16S rRNA gene sequence have been deposited in the GenBank data with the accession number: PRJNA890553.
The highest number of reads was found in the AMH1 sample (35,524), followed by the AFG1(33,573) sample, whereas the lowest number of reads was found in the AMH2 sample, with only 989 ( Table 1 ).
The number of ASVs ranged from 133 to 175 in the grazed samples and from 50 to 206 in the healthy samples. The mean bacterial diversities estimated by Shannon indices in the grazed and the healthy groups were 3.85 (SD 0.12) and 3.66 (SD 0.67), respectively. Calculations of alpha diversity revealed that the sample AFH3 had higher species richness (Observed: 206 and Chao1: 316.56) estimates than other samples, while those for AMH2 were the lowest (Observed: 50; Chao1: 50.50). However, there was no statistically significant difference in species richness between the grazed and healthy groups ( p > 0.05). Similarly, the Shannon diversity index showed no significant difference between these comparisons ( p > 0.05).
At the genus level, Endozoicomonas, Photobacterium, Algicola , and Vibrio were the ubiquitous genera in the phylum Proteobacteria and in most of the samples. The phylum Actinobacteria was dominated by Candidatus Actinomarina, whilst the phylum Campylobacterota was dominated by Halarcobacter ( Fig. 3 ). A statistical analysis (Wilcoxon rank-sum test) of the top 10 genera revealed significant differences in the mean relative abundance of bacterial genera between the grazed and healthy samples. Specifically, the mean relative abundance of Fusibacter was higher ( p = 0.012) in the grazed group (11.46 ± SD 13.7%) compared to that in the healthy group (0.24 ± SD 0.28%). Likewise, the genera Halarcobacter, Malaciobacter, and Thalassotalea in the grazed group had higher mean relative abundances than those of the healthy group ( p < 0.05, Fig. 4 ). However, six genera, including Algicola, Endozoicomonas, NS5 marine group, Photobacterium, Thalassolituus, and Vibrio, had no significant differences in the mean relative abundances between the healthy and grazed groups ( p > 0.05, Fig. 4 ).
In current study, an ASV was considered as a part of the core microbiome if it was present in at least 50% of samples. Using heatmap plot ( Fig. 5 ), we identified a total of 5 ASVs considered as the core microbiome across all samples. These ASVs were classified at the genus level, including four genera, Vibrio, Fusibacter, Altermonas, and Pseudomonas.
As shown in Fig. 5 , the genera Vibrio and Fusibacter dominated the core microbiome of the grazed samples, whereas, Pseudomonas was predominant in the healthy samples.
It is the first dataset on the bacterial communities in the grazed and healthy Acropora species, which was collected from Vietnam's Phu Quoc Islands and analyzed by 16S rRNA gene sequencing technology.

Sample Collection
Colony fragments of Acropora formosa ( 3 for grazed, 2 for healthy ), Acropora millepora (2 for grazed, 3 for healthy), widespread ubiquist specieses present in the global ocean were collected between 3 m and 5 m depth by scuba diving in shallow coral reefs of Hon Xuong Island of Phu Quoc Islands, Vietnam (9 °55 20.6"N 104 °01 16.4"E). Coral mucus was collected separately from grazed and healthy coral colonies. The fragment samples were taken out of the water with 3 minutes of air exposure. The mucus secretion that was triggered by this desiccation stress consisted of long gel-like threads dripping from the coral surface. The first 30 seconds of mucus production was discarded to prevent contamination and dilution by seawater. Then, the mucus samples were collected using sterile syringes and transferred from syringes to sterile cryotubes, where they were immediately fixed with 30% glycerol solution in a ratio of 1:1, then stored at -20 °C until further use [1] .

DNA Extraction, Library Preparation and Sequencing
DNA extractions were processed using the Easy-DNA TM gDNA Purification Kit (Invitrogen, Thermo Fisher Scientific, USA) following the manufacturer's instructions, and 500 μl bacterial DNA was extracted from different coral mucus samples. Amplification of the partial 16S rRNA gene with the purified template DNA was established by using the universal bacterial primer set 343F (S-D-Bact-0343-a-S-15, 5 -ACGGRAGGCAGCAG-3 ) and 802 R (5 -TACCAGGGTATCTAATCCT-3 ) [2] , 2X PCR Taq Master Mix (Thermo Fisher Scientific, Waltham, MA, USA). These primers were attached with specific 6-bp barcode sequences at the 5 end and then used for DNA amplification to produce 16S rRNA amplicons. A ∼460-bp fragment belonging to the V3-V4 region of the 16S rRNA gene was amplified. PCR amplifications were performed in an Eppendorf 6331 Nexus Gradient MasterCycler Thermal Cycler (Hampton, New Hampshire, USA) as follows: 30 cycles at 94 °C for 5 min, 94 °C for 30 s, 65 °C for 30 s, 72 °C for 1 min, and a final extension at 72 °C for 2 min. All amplicons were checked for size and quality by agarose gel electrophoresis before using the Miseq Illumina platform to perform sequencing of the 16S rRNA gene. The purified PCR product was used to prepare the DNA library, following the DNA library preparation kit protocol.

Data Cleaning and Analyses
The protocol for processing raw sequence data reads included filtering and trimming lowquality sequences, denoizing, inferring sequence variants, constructing an ASVs table, and assigning taxonomy, as described by Callahan in 2016 [3] . Data analysis was performed using the DADA2 pipeline (version 1.8) -R Studio (version 4.2.1) with some modification to optimize accession of the dataset. Chimera filtering was performed by the "removeBimeraDenovo" function of the "dada2" package, while taxonomy was assigned using the Silva taxonomic training data formatted for DADA2 (SILVA ribosomal RNA gene database project version 138.1) [4] .
Alpha diversity metrics were calculated to compare the microbial diversity of the samples, which included observed ASVs, the Chao1 richness estimator, and the Shannon-Weaver index, using the Vegan package in R [5] . As the majority of datasets did not follow an assumption of normality distribution, the Wilcoxon rank-sum test was used to compare the difference in relative abundance of bacterial taxa (phyla and genera) between two grazed and healthy groups. A p -value of < 0.05 was considered statistically significant.
The core microbiomes of coral genus were identified using the "microbiome" package and were illustrated with a Venn diagram. The core ASVs in coral microbiomes have been defined using different percentage cut-offs ranging from 30% to 100% [6] . In the current study, the presence of ASVs in at least 50% of samples was chosen as a conservative representation of the core microbiome.

Ethics Statements
This dataset has no involvement to human or animal ethics.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.